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Abstract. Traditional binned statistics such as suffer from information loss and arbitrariness of the binning procedure, 
which is especially important at low count rates as encountered in the XMM-Newton Extended Survey of the Taurus Molecular 
Cloud (XEST). We point out that the underlying statistical quantity (the log likelihood L) does not require any binning be- 
yond the one implied by instrumental readout channels, and we propose to use it for low-count data. The performance of 
L in the model classification and point estimation problems is explored by Monte-Carlo simulations of Chandra and XMM- 
Newton X-ray spectra, and is compared to the performances of the binned Poisson statistic (C), Pearson's and Neyman's 
xl/, the Kolmogorov-Smimov, and Kuiper's statistics. It is found that the unbinned log likelihood L performs best with regard 
to the expected chi-square distance between true and estimated spectra, the chance of a successful identification among dis- 
crete candidate models, the area under the receiver-operator curve of reduced (two-model) binary classification problems, and 
generally also with regard to the mean square errors of individual spectrum parameters. The C^n) statistics should only be 
used if more than 10 (15) predicted counts per bin are available. From the practical point of view, the computational cost of 
evaluating L is smaller than for any of the alternative methods if the forward model is specified in terms of a Poisson intensity 
and normalization is a free parameter The maximum-L method is applied to 14 XEST observations, and confidence regions 
are discussed. The unbinned results are compared to binned XSPEC results, and found to generally agree, with exceptions 
explained by instability under re-binning and by background fine structures. In particular, HO Tau is found by the unbinned 
method to be rather cool (kT ~ 0.2 keV), which may be a sign of shock emission. The maximum-L method has no lower limit 
on the available counts, and allows to treat weak sources which are beyond the means of binned methods. 
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1. Introduction 

Energy histograms are often used as spectrum estimators. 
While this procedure has the advantage of simplicity, the 
grouping of counts into a histogram is also associated with 
information loss. This becomes especially important if only 
few counts are available, so that the spectral fine structures 
are sparsely sampled by the observed counts. Such a situa- 
tion arises in the XMM-Newton Extended Survey of the Taurus 
Molecular Cloud (XEST; Gudel et al. 2006a i, thus prompting 
the search for alternative unbinned methods. 

Another motivation for histogram formation is to use 
as a simple and simply coded measure of agreement between 
theory and observation. The role of is thus to provide a 
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plausibility ordering of alternative models, and to assess their 
absolute credibility. Since the histograms follow a multino- 
mial distribution, which becomes gaussian only in the limit 
of infinite sample size, the use of x^ represents an approxima- 
tion and corrections must be applied for small n (e.g., Kendall 
& Stuart 1958 Wachter .1979. Nousek [T^ Mighell [T^ 
Arzner 2004). Alternatively, one may use the binned Poisson 
(C) statistics. There exists, however, a simpler solution. 

Namely, the relevant statistical quantity, the likelihood 
function, can be defined for an unlimited instrumental reso- 
lution without any reference to binning. Using this unbinned 
expression avoids arbitrariness of histogram formation, and 
thus a potential source of discrepant results. It has been suc- 
cessfully applied to ROSAT (Boese & Doebereiner l200 1 1 and 
EGRET (Digel 2000) observations. Related approaches invoke 
piecewise-constant intensity models (Scargle . 19981 see Stelzer 
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et al. 120061 for an application to the XEST) or transformations 
to uniform null hypotheses (Kinoshita l2002l l. and applications 
in other fields than astronomy include particle physics (Baker 
& Cousin 1984) and medicine (Miller et al. 2002). While the 
C statistic is often used in astronomical applications (e.g.. 
Dolphin 2002'for star formation statistics; Babu and Feigelson 
[1996 for a general overview), the unbinned (exact) Poisson 
likelihood is less often used. Alternative unbinned methods 
such as the Kolmogorov-Smirnov or Kuiper's statistics have 
the advantage of being (asymptotically) model-independent, 
which, however, also entails sub-optimal performance if the 
modeling was correct. 

In this article we revisit the issue by Monte-Carlo simula- 
tions of XMM-Newton and Chandra CCD spectra of the Taurus 
Molecular Cloud (TMC). By simulating counts from a known 
spectrum, and applying different statistics in order to find the 
best-fit model, we assess the performances of the various statis- 
tics, and thereby gain a more differentiated picture. The present 
study represents an extension of the work of Nousek (1989) and 
Wachter et al. < 19791 ) to more complex spectral models and a 
broader range of statistics. 

The article is organized as follows. Section 2 introduces 
the models for sources and background spectra. Section 3 dis- 
cusses the counting statistics. Section 4 describes measures 
af agreement between models, and between models and data, 
which can be used to assess the performance of various statis- 
tics. Section 5 describes the Monte-Carlo simulations and their 
numerical results. Real-data applications from the TMC are 
presented in Section 6, where also the issue of confidence 
regions is briefly touched, and the unbinned estimates are 
compared to the binned estimates from the XSPEC software. 
Section 7 contains a summary and conclusions. 

2. Spectral Models 

We start with introducing the spectral models f{E) used in this 
study, which represent the expected number of counts per unit 
energy. 

In modern observations, E can usually not assume con- 
tinuous values but is restricted to a discrete set of instru- 
mental output channels. For example, the FN detector of 
XMM-Newton has by default Nc - 4096 such channels; 
Chandra/ ACIS has Nc = 1024. In both cases, the channel sep- 
aration 6E is much smaller than the instrumental resolution as 
given by the spectral response matrix, so that /(£) is fully re- 
solved. In order to stress the quasi-continuous nature of the 
channel coverage, we shall write f{E)dE rather than fjSEj 
(j - l...Nc) unless stated otherwise. The channels are also 
much finer than the bins typically used when computing the 
C or statistics. In this sense, methods which directly use 
the energy channels will be referred to as 'unbinned', whereas 
methods which group the channels first into larger 'bins' will 
be referred to as 'binned' . 

In what follows we exclusively work with counts and focus 
on the problem of finding the best spectral model for a given 
set of counts. Accordingly, we do not consider here the prob- 
lem of spectral deconvolution, but absorb the instrumental re- 
sponse in the forward model /(£). The symbol E thus denotes 



the observed (channel) energy rather than the incoming photon 
energy. 

2.1. Source 

Our source models assume absorbed collisional ionization 
equilibrium (APEC; Smith et al. ,2001 as implemented in 
XSPEC; Arnaud ri996> . convolved with the instrumental re- 
sponse. The templates are defined on the Nc instrumental chan- 
nels, and depend on only two physical parameters: the (elec- 
tron) temperature kT in units of keV, and the hydrogen col- 
umn density Nh in units of 10^^ cm"^. This simple model is 
adapted to the faint sources addressed by the L statistic. The 
parameters (kT, Nh) are specified on a double logarithmic lat- 
tice with sufficiently dense coverage that intermediate spec- 
tra may be interpolated with error d-^2 < IQ^^ (see Sect. I4.lt . 
This gridding is used for simulations only. The total number of 
expected counts TVsic is an additional parameter The normal- 
ized templates are displayed in Fig. [2 (top panel, selection). 
For better physical clarity, the x-axis refers to channel energy 
rather than to channel number In the Monte-Carlo simulations, 
atomic lines are included assuming a fixed metallicity of 0.2 
times the solar abundances of Anders & Grevesse J1989> . and 
we use the Chandra rather than the XMM-Newton instrumental 
response because of its smaller number of instrumental chan- 
nels, which accelerates the simulations. This choice applies to 
the simulations only, and we return to the XMM-Newton in- 
strumental response and to a more refined abundance model 
when dealing with real XEST data. The function /(£) contains 
a background which is interpolated from calibration observa- 
tions (see Sect. 12. 2> . The astrophysical relevance of the param- 
eters {kT ,Nh, N) is elucidated in the accompanying article of 
Giidel et al. J2006al ). From an empirical point of view, the ef- 
fect of large Nh is to suppress (absorb) the spectrum from at 
low energies, and the effect of large kT is to enhance it mostly 
at high energies. 

2.2. Background 

Although Chandra, owing to its high spatial resolution, has 
little background, the present simulations include an addi- 
tive background for the sake of generality and applicability to 
XMM-Newton data. The background is included in the forward 
model according to 

/(£) = /„c(£) + /bg(£) (1) 

N = Mrc + /^bg . (2) 

The background model fhgiE) is obtained from an extended 
source-free extraction region. Since the background estimate 
is typically much better than the source estimate, the back- 
ground is considered as known; this represents a simplifying 
assumption. The background density' fb^j is determined from 
the observed background histogram //bgj by minimizing the 

' For numerical conciseness, we use here channel-indexed (fj SEj) 
rather than continuous (/(£) dE) to notation. 
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Channel Energy [keV] 

Fig. 1. Top: normalized source models used in this study. An 
example model (black) and its neighbours (dark gray) are high- 
lighted for better clarity. Inlet: the example model and its neigh- 
bours in parameter space (kT,NH)- Bottom: normalized back- 
ground model. 

goal function [/bgj-i^bg,;]' + T.%\cr{E j) 1 6E jf{f^^j^i - 
fbgj]^> where cr(E) - KyfWfkSW represents the approxi- 
mate instrumental resolution. The coefficient k is 0.07 for 
Chandra/AClS, and 0.05 for XMM-Newton/EPlC, and is cho- 
sen such that o-{E) gives the full width half maximum en- 
ergy resolution of the respective instruments. The choice of 
the above goal function amounts to first-order regularization 
and yields a tridiagonal system which is easily solved numeri- 
cally. It ensures^ that f\,gj is unbiased in the (weak) sense that 
'^jfi'sJ ~ 'ZijHbgj, and that f^gj has the approximately cor- 
rect energy resolution in the sense that the response f^^ j to 
a unit pulse //bgj - Sj,k has the mean-square width Y^ji^j ~ 
Euff;jY.jf;gj-2cT\E,). 

To summarize, we use a smooth density estimator for 
the background which interpolates the observed background 
counts, preserves their total number, and has the (approxi- 

^ These properties are most easily seen from a continuous version 
where Jg[(/bg(£') - H(E))^ + cr^(E)f^^(Ef] dE is minimized with re- 
spect to fbg{E). Carrying out the variation (with fbg(E) kept fixed at 
the boundaries) one obtains fbg(E) - (cr^(E)f^^(E)y = H(E). This 
may be integrated to j fbg(E)dE = j H(E)dE, assuming that /'(£) 
vanishes at the boundaries. Multiplying the above differential equa- 
tion by (E - EkY, taking H(E) = 6(E - Et), and assuming that 
cr^(E) varies slower than fbg(_E), one finds j fbg(E)(E - Et)-dE ^ 
2cP'{Ek) f fbg{E}dE by integration by parts, q.e.d. 



mately) correct instrumental resolution. The normalized back- 
ground spectrum of a typical Chandra/AClS observation is 
shown in Fig. [2 (bottom panel). The peak at 1.8 keV is due 
to silicon in the CCD. 

3. Counting statistics 

Photon counting experiments such as XMM-Newton and 
Chandra follow the Poisson statistics, which is briefly recalled 
here. More detailed discussions may be found e.g. in Feller 
([T9681, Eadi e et al. dTOTlt . Sant alo ([T9761 . Wachter lfT979l . 
Reiss J1993> . and Protassov et al. (I2002> . 

3.1. The Poisson process 

We consider throughout a non-homogeneous Poisson process 
in an interval B - (E^in, fimax) of the real axis with finite in- 
tensity /(£). Generalizations to (pointwise) infinite f{E) and 
higher dimensions can be found e.g. in Reiss (,1993j . The 
Poisson process is then characterized by the following two 
properties: (i) the probability of finding nj counts in an sub- 
interval Aj c B is 

e-^'f! r 
Prob(n;) = j-^- where Xj = f{E)dE , (3) 

and (ii) the numbers of counts in disjoint A/s are statisti- 
cally independent of each other The first Equation in Q de- 
fines the Poisson distribution, and Aj is called the parameter of 
the Poisson distribution. 

When counts from different intensity functions fk{E) are 
added, the resulting process is again Poissonian with intensity 

fkiE). This fact may be used to include an observational 
background, and to construct a Poisson process of arbitrary 
overall expectation 

yV= r f{E)dE (4) 
Jb 

by first drawing the number n of counts from a Poisson 
distribution (Eq. |3} with parameter N, and then drawing each 
count Ej, i - 1 ...n from the probability density 

p(E) = f(E)/N (5) 

(see Reiss [T9931 Theorem 1.2.1). The above construction 
relies on the factorization of the number of counts from their 
position on the energy axis. The integral in Eq. is numeri- 
cally approximated as a sum over instrumental channels. 

3.2. Likelihood function and asymptotic forms 

The likelihood function is defined as the probability of finding a 
certain observation given the true model from which the obser- 
vation derives. Given the observed sample size n, the likelihood 
of the observation {Ei , ...,E„} is 

P(£i, E„ I kT,NH) = p(Ei) ■ ... ■ p(E„) . (6) 



Arzner et al.: unbinned ML estimators 



4 

Equation represents a probability density at (E^, ...,E„). 
The shape of the function p{E) is determined by the param- 
eters kT and Nh- 

As outlined above, the normalization N factorizes out, and 
can be estimated from the total number of observed counts 
alone. We shall do this using the maximum-likelihood estima- 
tor (i.e., that N which maximizes e^'^ N" /n\). Since the back- 
ground contribution Nbg is known (Sect. 12. 2> . only the source 
contribution Ns^rc needs to be estimated; this will be done using 
the maximum-likelihood estimator 

= max(0, n - Nhg) ■ (7) 

The exact likelihood (Eq.|6j can be recast in alternative and 
asymptotic forms which are often used in astrophysics. When 
binned into finite bins Aj of predicted content pj - p(E) dE 
containing «y observed counts, the likelihood (given the total 
number of observed counts rij - n) becomes the binomial 
distribution 

P(«i, ...,n!,JkT,NH) = -p"' ■ ... ■ p""" ■ (8) 

In the asymptotic limit (rij — > oo), the standardized variables 
(rij - npj)l -\Jnpj become normal and the quantity - 2y jCy 
becomes;i'^ distributed with («- 1 ) degrees of freedom (Kendall 
11958 ). Accordingly, the log likelihood (logarithm of Eq.|S} be- 
comes chi square distributed with (n - 1) degrees of freedom, 
and we may use the chi square statistic. The approximation of 
the logarithm of a multinomial by a chi square distribution is 
especially good when all the npj are (approximately) equal; in 
this case (and only in this case), the npj need not be large by 
themselves (Wise il963 ). There are different forms of the chi 
square statistics, and we shall return to these - and to alterna- 
tive statistics - in Sect. 14.21 

3.3. Binning 

The choice of optimal bins (Schott n'992> is an important issue 
which affects the outcome of the binned methods. Histograms 
can have uniform or non-uniform bins. While uniform (equal- 
size) bins have the advantage of being independent of the data, 
they are also less adapted and may contain very few counts. 
Non-uniform bins are usually chosen to contain the same (pre- 
dicted) probability mass or observed counts. 

For either (uniform or non-uniform) type of bins, one needs 
to decide on the number of bins or the average number of 
counts per bin. A well-known method to choose the number 
Nb of uniform bins is Sturges' rule (Sturges[l926|l 

Nb = l+\og2n (9) 

where n is the total number of observed counts. Sturges' 
rule is based on the assumption that the counts £, follow a 
normal distribution, and tends to under-estimate the number 
of bins needed to resolve more complicated forms such as the 
f{E) considered here. An alternative approach, which is more 
adapted to the actual shape of p{E), minimizes the asymptotic 
mean square integrated error (AMISE). This implies a trade-off 
between the integrated variance (due to finite counts per bin) 



and the integrated squared bias (due to variation of p(E) across 
the bins). Expanding p(E) to first order in each histogram bin, 
this yields the expression 

= {6/R(p')y'\-^'^ (10) 

for the optimal bin width, where R(p') = p'(E)^dE mea- 
sures the roughness of the spectrum (Scott . 1992 theorem 3.3). 

In our simulations, the optimal number n* of counts per bin 
is either taken according to Eq. (|9} as n* = n/(l + log2 n), or 
according to Eq. ([lOj as n* = n^^^B-\6/{R{p'))y'^ where the 
(unknown) true R(p') has been replaced by an average {R{p')) 
over all models under consideration. The choices based on Eqs. 
(|9} and ilQ\ will be referred to as Sturges' and AMISE meth- 
ods. Alternatively, n* is fixed at a given value, similar as in 
the standard XMM-Afewfon and Chandra data analysis software. 
This will be referred to as fixed-n* method; a typical choice is 
10 counts per bin. (In XSPEC, this becomes > 10 cts/bin for 
high-intensity flares which are not considered here.) For the 
problem considered here ({R{p')) = 2.9 cts/keV^*; N ranging 
from a few 10 to a few 100), the AMISE method gives less 
than 10 counts per bins, whereas Sturges' method gives a few 
to a few ten counts per bin. 

Once the number of counts per bin n* is chosen, the bin- 
ning itself is performed using either uniform or non-uniform 
bins. For the case of non-uniform bins we use by default bins 
with equal numbers of observed counts, with the k-th bin run- 
ning from Eit„- to ^(t+i),,.. This simple method is not optimal 
but adopted for consistency with the standard analysis soft- 
ware. Alternative methods have also been considered, which 
are based on equal predicted bin content, and found to yield 
similar results. 

4. Distance measures 

Next we define measures of agreement between different mod- 
els, and between models and observations. These will allow to 
estimate models from the data, and to assess the discrepancy 
between true and estimated models in the subsequent Monte- 
Carlo simulations. 

4.1. Chi-square distance between models 

Different sets of parameters {kT,NH,N) result in different 
model spectra, whose discrepancy can be measured by a suit- 
able distance in the space of predicted spectra (/-space). It is 
important to realize that it is the distance in /-space which is 
relevant for the observational discrimination between alterna- 
tive models, and not the (say, Euclidean) distance in parameter 
space, because the latter may degenerate solely due to ambigu- 
ous parameterization. Recalling that f(E) is the intensity of a 
Poisson process, and assuming that fi(E) is the true and f2(E) 
is the estimated spectrum, we use here the chi square distance 
(e.g., Gibbs EOOa 

dAfiJ^)^ r ^-^' ~/^^^ dE. (11) 
Jb Ji 

An illustration of dj^i is given in Fig. ^ where the dark 
gray spectra deviate by d-^i < 5 from the reference spectrum 
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(black), assuming that N - 100. The parameters kT and Nh 
are not independent, as indicated by the banana-shaped region 
in Fig. n (inlet). It should be pointed out that it is not this 
kind of degeneracy which is addressed in the present paper, 
but rather the principal capability of various statistics to dis- 
tinguish between models which differ in the sense of Eq. (II 1> . 
Note that d^2{f\,f2) is not a metric since it is not symmetric 
in its arguments, and also violates the triangle inequality with 
respect to the first argument fx (but not with respect to the sec- 
ond argument fi). Alternative probabilistic distances may be 
found e.g. in Basseville (1989), Rachev {1991}, Reiss fl%3l 
Miiller ( 



PT995 1, O' Sullivan et al. ( 1998), Robinson et al. ( I2000t . 
Johnson & Sinanovic (2001 ), and Gibbs & Su (2002i We have 
chosen here Eq. dl 1> because it does not require probabilistic 
normalization of /, and because it is not explicitly adapted to 
the L statistics, the performance of which is to be demonstrated. 

4.2. Distance between models and measurements 

After having defined a measure of agreement between differ- 
ent models, we shall specify measures of agreement between 
models and observations. To this end, we consider the un- 
binned likelihood (Eq. |6j together with a selection of some 
of the statistics which are often used in an astrophysical and 
astronomical context (e.g., Gosset [79871 Wachter et al. 197&. 
Nousek & Shue[L989 Babu & Feigelson [T996l Mighell lT^ 
Metchev lMHl Paltani i2004i : 



L = Y,\np(Ei) (12) 

(=1 

C = yinjlnAj-Aj) with Aj = I dE f{E) (13) 

(14) 



Ni, . , s 



./=1 ^ 

'•"It — 



{Hj + 0) 



D = max|P(£')-5iv(£')| with P{E) 



= J" p{E')dE' 



(15) 



(16) 



V = m&x[P{E) - S n{E)) + m&x[s n{E) - PiE)) . (17) 

Above, L is the unbinned log likelihood (logarithm of Eq. 
|6j, C is the C statistic (Cash 1979 1, and D is the Kolmogorov- 
Smirnov (or uniform) distance between two probability den- 
sities, of which the Kuiper statistic V is a variant with more 
balanced sensitivity across [/imin, £max] (Kuiper [19621 . P{E) 
is the predicted cumulative distribution, and S n(E) is its ob- 
served counterpart, i.e., S NiE) is the number of counts with 
energy smaller than E, normalized by the total number n of 
counts (thus < Sn(E) < 1). A^^, denotes the number of bins 
of the binned methods (Sect. 13. 3> . The subscript 'N' in Eq. 
il5i stands for Neyman; it is undefined for bins with rij - 
which are therefore excluded. (The reference to Neyman is for 
his detailed analysis; the replacement of theoretical variances 



by observed ones has already been proposed by Pearson - see 
Hald 1998). Note that evaluation of L is computationally less 
expensive than evaluation of the binned statistics when there 
are fewer counts than channels and if the model provides p(E) 
rather than P(E). 

It should be pointed out that the selection of statistics (Eqs. 
I12I17> is aimed at our astrophysical application of estimating 
X-ray spectra, and is not free of personal bias. The L statis- 
tic is the one which is primarily addressed by the present ar- 
ticle. The C andx^ statistics are implemented in the standard 
XSPEC software pakage; they are included for benchmarking 
and comparison of observational results. The statistic is an 
obvious variant ofxi,, which is obtained as the asymptotic limit 



of the binned Poisson log likelihood (cf. Sect. 13. 2> . Among the 
unbinned statistics, the Kolmogorov-SmirnovD is probably the 
most fundamental one used in astrophysics. The Kuipers statis- 
tic V has been included as an example of an improved ver- 
sion of D. Others (such as the Anderson-Darling statistic) could 
have been included as well, but we decided to restrict the list in 
order not to overload the diagnostics. 

In order to treat all statistics (I12> - on the same foot- 
ing, the source normalization A/'src is always estimated from Eq. 
(0, and only the shape parameters {kT, Nh) are estimated dif- 
ferently for each statistic. The case Nsk = is rarely met in 
practice. 
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Fig. 2. Schematic construction of a continuous non- 
homogeneous Poisson process by the rejection method: out of 
Np uniform random points in the rectangle B x [0, max(/)] 
only those beneath the curve f(E) are accepted; their abscissae 
constitute the event list (ticks). The number A^p itself is Poisson 
distributed with expectation value \B\ x max(/). 



4.3. Receiver-Operator Characteristics 

Equation (II 1> can be used to measure the discrepancy between 
true and estimated models, and thus to assess the performance 
of the statistics (Eqs.[21-E}- Alternatively, their performance 
can also be characterized in terms of the classical receiver- 
operator characteristics (ROC; Peterson et al. 119541 Hanley 
& McNeill .1982. Michel 2003j . This diagnostics applies to 
the binary classification problem, and visualizes the trade-oflF 
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0.0 0.2 0.4 0.6 0.8 1.0 

false positive rate 

Fig. 3. Receiver-operator characteristics for the binary classifi- 
cation problem with only two models (inlet). 



between the reliability and completeness of detections. In its 
standard formulation, the binary classification problem consid- 
ers two types of objects (healthy and ill patients, for example) 
with a real-valued attribute. The attribute is a random variate, 
the probability density of which depends on the object type. 
The task is to classify the objects by their attributes, which is 
most simply done by applying a threshold. The classification 
is only unique if the probability densities of the attribute do 
not overlap. There are four possible outcomes of the classifica- 
tion procedure, with frequencies indicated in brackets: a type 
1 object may be correctly classified as type 1 (nn) or erro- 
neously as type 2 (ni2), and a type 2 object may be correctly 
classified as type 2 (^22) or erroneously as type 1 (n2i)- From 
these frequencies we define the sensitivity nii/(nii + n^) and 
the specificity n22/(n2i + ^22)- (The asymmetric naming stems 
from associating type 1 and 2 with asymmetric entities like 
healthy and ill patients.) Let us assume that a low attribute is 
typical for type 1, and a high attribute is typical for type 2. If 
the threshold is high, only few objects are accepted, but most 
of them are correctly classified as type 2. However, the bulk of 
type 2 objects is missed since it interferes with type 1. Thus 
the specificity is high but the sensitivity is small. Conversely, 
a low threshold detects all type 2 objects, but at the expense 
of type 1 contamination. In this case, the sensitivity is high but 
the specificity is low. As the threshold is varied, it traces out a 
trade-off between specificity and sensitivity that is traditionally 
displayed in a (1 -specificity) versus sensitivity diagram, equiv- 
alent to false positive rate versus true positive rate. This graph 
is called the receiver-operator characteristics (ROC). 

In order to apply the ROC procedure to the XMM- 
Newton spectrum estimation problem, we restrict the latter to 
only two possible models f\ and /2, and use the log likelihood 
ratio L2 - L\ as an attribute, and similarly C2- C\, X\ ~ x].^ 
x\ 1 ~ A^N 2' ^1 ~ and V\ - V2- By convention, a low at- 
tribute thus hints to model 1 and a high attribute to model 2. 



We then choose at random one of the two models, generate 
an eventUst, and compute the above attributes. This procedure 
is repeated some 10^ {-Yiij^ij) times, and the sensitivity and 
specificity are computed for varying thresholds. The resulting 
ROC curves are shown in Fig. |3l The two models fx and /2 
have the same hydrogen column density Nh - 4.5 and normal- 
ization AT = 30 but their temperatures differ by a factor two, kT\ 
-\A keV and kT2 - 2.5 keV. Such a difference is within astro- 
physical expectations. The chi square distance between the two 
models is d^2{ fi,f2) - 3.2. Different curves represent differ- 
ent statistics. Ideally (100% sensitivity and 100% specificity), 
the ROC curve would go through the upper left corner (0,1) 
of Fig.|3l whereas a completely non-discriminating test would 
yield a straight line along the diagonal from (0,0) to (1,1). The 
actually realized curves are between the two extremes, with the 
exact likelihood coming closest to the ideal point (0,1). 

The ROC curve can be used to assess the performance of 
the statistics ('Eqs.ll2l-I17> in the binary classification problem. 
A commonly used performance measure is the area under the 
ROC curve (AUC). Like the chi square distance, the AUC gives 
a measure of observable discrepancy between two models. For 
Fig. Ewe obtain the AUC's 0.761 (L), 0.705 (C), 0.679 {x\ 
0.655 ixli), 0.699 (£)), and 0.727 (V), thus establishing the or- 
dering L>V>C>D>x^> Xn- 

5. Monte-Carlo simulations 

In order to explore the performance of the unbinned UkeU- 
hood and compare it to the alternative statistics we have con- 
ducted extended Monte-Carlo simulations. The simulations in- 
volve two general steps. In the first step, a model is chosen at 
random and an event list is generated by either the inversion or 
rejection method (Devrove il986i) . The inversion method is de- 
scribed in Appendix A, and the rejection method is illustrated 
in Fig. 13 The inversion method is faster and thus used in most 
simulations; both methods have been checked against the an- 
alytical Poisson probabilities. In the second step, we compare 
the event list with models which are close enough to the true 
model in a d^i sense in order to be potentially successful can- 
didates. This is done using all statistics listed in Eqns. \\2\ - 
(I17> : the decision rule is to accept the model with largest (L, C) 
or smallest ix'^,x\i^ D, V). The above two-step procedure is re- 
peated for many realizations of the event list, and several diag- 
nostics are applied. When the parameter N^^c is varied in the 
simulations, this is done by taking it uniformly distributed out 
of the indicated interval. The background A/bg is kept fixed. 
Several runs addressing different aspects have been performed. 

5.1. Safe discrimination 

In a first family of simulations, we investigate the minimum 
chi-square distance between two models which allows a safe 
distinction on grounds of the observed event list. 

To this end, we work with the discrete set of models of 
Sect. 12. H and call an estimate a 'successful identification' if the 
estimated model equals the true model. We are interested in 
the chance of a successful identification, and proceed as fol- 
lows. For each realization S of the eventUst drawn from a true 
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Fig. 4. The probability of successful identifications versus the 
number of t/^i -ordered candidate models (left), and versus the 
maximal distance of the candidates from the true model (right). 
See text. The simulation has 10 < Mrc < 100, A/bg - 0, and 
the binned methods use uqual-size bins based on Sturges' rule, 
resulting on average in (n*) = 9.1 counts per bin. 



model /i, the candidate models {/2) are ordered by increas- 
ing d^i{f\,f2), and examined sequentially on grounds of the 
statistics M2\ - Mil . The first candidate is always taken as the 
true model. As the number m of candidates increases, the initial 
(successful) estimate may be abandoned in favour of a wrong 
estimate. Let us assume that this happens, under statistics yu and 
for the realization £, at the k-th candidate and define (^{m) to 
be one for m- 1 ... k and zero for m > k. When the simulation 
is repeated for a large number of realizations £ (using different 
/i), the quantity 7?^(m) = S^(m)/Sf,(l) with S ^(m) = 2e ^('w) 
gives the probability of a successful identification in a search 
over m (ij,2 -ordered candidates using statistics /i. 

An example of R/jim) is shown in Fig.0](left), with different 
curves referring to different statistics fi. The binned methods 
invoked Sturges' rule to determine the number of counts per 
(equal-size) bin. As increasingly unlikely candidates become 
included, the R^{m) converge to a constant values. This holds 
true for all statistics; however, there is a clear performance or- 
dering Ri > Rd > Rc > R^^- > Rv ^ Rjf^^ - The exact likelihood 
performs best, followed by the Kolmogorov-Smirnov distance. 

While the quantity R^{m) directly relates to the simulation 
procedure, the number m by itself is not of interest and should 
be replaced by d^i in order to obtain a more meaningful charac- 
teristic. This is achieved by replacing the order indicator 0^(m) 
by a distance indicator 6^{d^2) which switches from unity to 
zero at the distance of the first erroneously accepted candidate, 
and proceeding in the same way as for R^{m). As a result we 
obtain the probability Q^{d^i) of a successful identification in 
a t/y2 -ordered search among (discrete) candidates with maxi- 
mal distance d^i from the true model. The graphs of Q^(d^2) 
are shown in the right pannel of Fig. |3 referring to the same 
simulation as Fig. |3 (left). Like R^{m), Q^{d^i) converges at 
large d^i to a constant value which depends only on the statis- 
tics used. The performance ordering derived from Qn{d^2) is 
the same as for R^{m). 

Aside from the success rates one may ask for the error in 
parameter- and /-space if the identification fails. The closer 
the estimated solution is to the true one, the better the method. 
Figure m displays the residuals in /- and parameter space for 
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Fig. 5. Average /-space (top left) and parameter (other panels) 
deviations between true and estimated models. The number m 
of candidate models delineates the tested volume in parameter 
space. 



the same situation as in Fig.|4](left); the quantities {AkT, ANh) 
are the standard deviations between true and estimated model 
parameters. They give the typical accuracy of best-fit parame- 
ters, averaged over the whole parameter space. As can be seen 
from Fig.|5] the exact likelihood yields the smallest {d^i} (top 
left) and also the smallest parameter errors (AkT, ANh)- The er- 
ror in the normalization estimate (Eq. is simply AM - y/N 
and is not shown. 

The stabilization of the curves observed in Figs. |4] (right) 
and |5] (top left) can be used to define a maximum chi-square 
distance c/T" between true and estimated models above which 
mis-estimation becomes highly unlikely. From Fig.|5|(top left) 
we find that {d^i)^ stabilizes around 3, whereas Fig. 0] (right) 
indicates that erroneous estimates almost never occur for d^i 
15. This behaviour is generic and holds true for a wide range 
of model parameters (Fig. |6|l, suggesting that the chi square 
distance is a useful parameter-free measure of distance between 
two Poisson intensities. We thus select 



t/™" = 50 



(18) 



as a safe cutoff for potentially successful candidate models. 
Candidate models which differ by more than c/T" from the true 
model are not considered. The error introduced by this neglect 
is small; from the decay of the histogram of best-fit distances 
d^i we have estimated that less than 10"^ of all realizations are 
concerned. 
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Fig. 6. Average chi-square distance between true and estimated 
models as a function of the true model parameters (marginal 
distributions). 
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Fig. 7. Similar to Fig.|5lbut with m proportional to the density 
of candidate models in parameter space. 

5.2. Resolution 

In a second family of simulations, the restriction to the discrete 
models of Sect. 12.11 is relaxed by interpolating the models in 
(A;r, A^/f)-space. Equation J18l l is then used to limit the search 
for best-fit candidates. The number m of candidate models thus 
delineates the density of models in parameter space, and as m 
is increased, the best-fit candidates converge to the best-fit so- 
lutions. Figure Q shows the result of this density exploration. 
Note that the {d^i} curves converge at large m to the same val- 
ues as in Fig. |5] confirming that d^^ and the maximal m are 
ample and do not effect the outcome of the simulation. 



5.3. Effect of binning, count rate, and background 

Up to here the binned methods invoked only Sturges' rule (Eq. 
|5J for better comparability. In this Section, the effect of differ- 
ent binning methods and of the bin size is explored. The case of 
a single bin must be excluded, since the predicted bin content 
is then insensitive to the parameters kT and Nh- 
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Fig. 8. The effect of binning on the performances of {C,x^,Xn^- 
The corresponding unbinned L values are shown for compari- 
son. 



In order to demonstrate the effect of the bin size we arbi- 
trarily vary the number of bins A^^ (Fig.|S}. Black curves refer 
to equal-content bins, and gray refer to equal-size bins. The 
curves represent averages over (kT,NH) and 10 < N < 200. 
At small Nb (many counts/bin), the iC,x^,XN) curves collapse 
for both equal-content and equal-size cases. At large Ni, (few 
counts/bin), the performance of C monotonically approaches 
the L limit (black solid line). For large Nb, the x^ and x^ 
statistics are not applicable due to low count rates. As a con- 
sequence, their performance does not approach the C and L 
limit with increasing Nb- This is especially pronounced for the 
equal-sized bins, which generally perform worse than equal- 
content bins. From Fig.|S]one may deduce the minimum num- 
ber of counts per bin at which the chi square statistics apply: 
the increase of the (x^,xlf) curves at Nb ^ 10 indicates that at 
least some 5-20 counts per bin should be present. By narrowing 
and shifting the N-cnt one may explore the AT-dependence of 
the turnaround, and thereby find that x^ and xlf require about 
10 and 15 counts per bin, respectively. 

Complementary to Fig.|8] Fig.|9lshows the effect of N, av- 
eraged over Nb- Again, A^^ is independent of N for demonstra- 
tion purposes. The bins have equal content. The /-space errors 
(top left) are found to be weakly dependent on TV except for the 
xl( statistics, where the neglect of empty bins has a significant 
effect at low count rates {N ^ 40). The errors AkT and ANh 
scale like TV which is inherited from the the normalization 
error ATV ~ TV'^^ (not shown). 

We dismiss now the ad-hoc choice of Nh used in Figs. |8l 
and|9l and return to the better adapted binning methods of Sect. 
13.31 The number of counts per bin is thus determined from the 
Sturges' or AMISE rules, or fixed at n* - 8; the bins have ei- 
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Fig. 9. Dependence of the estimation errors on the true ex- 
pected counts N. 



ther equal size or equal (observed) content. The result of this 
simulation family is summarized in Table showing the per- 
formance ranking based on the mean chi square distance be- 
tween true and estimated models. All simulations involve 10^ 
reaUzations, and the true source normalization is either TVsrc = 
20 or TVsrc = 150, while the parameters {kT, Nh) are continuous 
(interpolated). A background of A/bg = 50 has been included 
for the sake of generality, so that both background-dominated 
and source-dominated situations are addressed. From Tabled 
it is seen that the exact likelihood (L) performs best and the 
Kolmogorov-Smirnov distances {D) performs worst. The rel- 
atively bad performance of D compared to Sect. 15.1 1 is found 
to be caused by the background. Pearson's generally per- 
forms better than Neyman's Kuiper's V performs better 
than Kolmogorov's D. A detailed comparison of pairs of sim- 
ulations differing only by the count rate reveals that the chi 
square distance between true and estimated models is always 
smaller for TV = 150 than for TV = 20, which is expected since 
more counts allow a sharper distinction of the models. Equal- 
content bins generally perform better than equal-size bins. 

When the area under the operator-receiver curve (AUC; 
Sect. 14. 3> is used as an alternative performance measure instead 
of the average chi square distance between true and estimated 
models, we obtain the results of Tab. |2l This is similar to Tab. 
[Ubut with A> B indicating that <AUC)a < 0.99 <AUC>b and 
A > B indicating 0.99<AUC>b < <AUC)a < <AUC)b (like 
small {d^i), small (AUC) indicates close agreement of true and 
estimated models). The average is over 10"* pairs of models 
covering the (continuous) parameter space {kT, Nh), with each 
pair of models being probed by 2 ■ lO'* realizations of Poisson 
data. Although the AUC criterion is quite different in spirit 
from the chi square distance criterion (binary classifier versus 



estimation problems), the performance orderings obtained by 
the two methods are surprisingly similar. In particular, the un- 
binned likelihood (L) performs always best, while the internal 
performance ordering of {C,x^,xli' V) partially changes. It 
was verified that L performs best not only on the average, but 
also for all model pairs individually. 
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Table 1. Performance ordering of the statistics (Eqs. I12I17> . 
based on the average chi square distance between true and 
estimated models. The notation A > B indicates here that 
{d^2}A < Q.95{d^2)B, and A > B indicates that 0.95<d'^:)B < 
{d^i)A < {d^i)B- The labels 'e.s.' and 'e.c' refer to 'equal size' 
and 'equal count' bins, respectively. TVbg = 50. 



method binning A/'s,c 



Sturges 
Sturges 
Sturges 
Sturges 
AMISE 
AMISE 
AMISE 
AMISE 
n*=8.0 
n*=8.0 
;2*=8.0 
;3*=8.0 

Table 2. 

operator 



e.s. 
e.s. 
e.c. 
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Similar to Tab.^ but using the area under the receiver- 
curve (Fig.O as a measure of performance. TVbg = 50. 
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Figures 13 -|5]did not include background (TVbg - 0). In or- 
der to investigate the effect of the background we have repeated 
the simulations with variable background ratio TVbg /TV. It turns 
out that the presence of a background does not affect superior 
performance of the unbinned likelihood L. There are, however, 
differences in the performance ranking of the other statistics; 
in particular, the unbinned D and V statistics are found to be 
degraded by the background. We shall not show here the full 
diagnostic applied to the zero-background case, but restrict our- 
selves to an exemplary result. Figure ^| shows the number 
of successful identifications versus the number of models at 
choice for TVbg = 50, and is to be compared to Fig.|4] The lead 
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Fig. 10. Similar to Fig.0] but including background (Nbg - 50). 

of the L statistic is even more pronounced in the presence of 
background; the performance ordering derived from Fig.llOlis 

L>V>C>x^>xl>D. 

6. Real-data application 

6.1. Observations and fonvard model 

After establishing the superior performance of the unbinned 
likelihood by Monte-Carlo simulations, we apply it to actual 
XMM-Newton observations of the TMC. The data set (Tab. 01 
comprises 14 objects, and has been selected for low numbers 
of observed counts (A^cnt) and simplicity of the spectra so that 
parameterisation by (kT,NH, AT) is adequate. Also, we have as 
far as possible avoided cases where the background is not well 
known, and where the maximum-likelihood parameters lie on 
the boundary of the XSPEC parameter space, which would re- 
quire a more careful analysis (Protassov et al. 20Q2jt- 'Bad' time 
intervals with increased background were omitted. 

The source models are similar as in Fig.^Ctop) but adapted 
for the XMM-Newton instrumental response and background. 
The background spectrum is estimated according to the proce- 
dure of Sect. 12.21 setting k = 0.05 to characterize the approxi- 
mate XMM-Newton/EPlC resolution. Only the FN detector of 
the XMM-Newton/EPlC instrument is used. The abundances, 
relative to solar values (Anders & Grevesse ll989> . are repre- 
sentative for highly active young stars with inverse FIP effect 
(Telleschi et al. .2005. Scelsi et al. 2005 Garcia- Alvarez et al. 
l2005t : He: 1.0 - C: 0.45 - N: 0.788 - O: 0.426 - Ne: 0.832 - 
Mg: 0.263 - Al: 0.50 - Si: 0.309 - S: 0.417- Ar: 0.55 - Ca: 
0.195 - Fe: 0.195 - Ni: 0.195. The total exposure time Texp 
used for the analysis is in the order of a few 10 kiloseconds, 
and is given in Table |3l The best-fit fluxes are converted into 
luminosities Lx using XSPEC and assuming a distance of 140 
pc to the TMC. 

6.2. Best-fit models 

An example observation is shown in Fig. using data of 
Haro 6-13. The source extraction region contains 128 counts 
between 0.3 and 7.5 keV after elimination of bad time intervals, 
whereas the corresponding (scaled) background contribution is 
Nbg = 28.8 counts. The observed counts are marked by ticks 
in Fig. ^2 while the background spectrum /bkg(£) is shown 
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Fig. 11. XMM-NewtonfEPlC observation of Haro 6-13, com- 
prising 128 counts. Solid line: maximum-L solution fsrdE); 
dashed: background spectrum /bkg(£'); ticks: observed counts. 
Inlets: best-fit parameters (crosses) and likelihood profiles 
(contours) at nominal confidence levels 68% (boldface), 90% 
and 99%. See Sect.lO 

by the dashed line. Energies below 0.3 keV are discarded. The 
maximum-likelihood estimator for the source normalization is 
Mic = 99.2 counts. The unbinned log likelihood is computed 
inside the parameter cube of Fig.^2(iiil6t)> and attains its max- 
imum at kT = 3.03 keV, Nh = 0.48 xlO^^ cm^-. These values 
are marked by crosses in the inlets of Fig. ^2 and the corre- 
sponding best-fit model fsidE) is shown by solid line (main 
panel). For comparison, the corresponding minimum-;t'^ pa- 
rameters, using 10 equal-content bins, are given by kT - 2.55 
keV, Nh - 0.48 x 10^^ cm"^. The minimum and maximum 
L-estimates thus agree within errors (see below), but are not 
equal. The parameters [kT, Nh) do not follow an equally sim- 
ple trend, and are more sensitive to the statistics used. The other 
best-fit parameters listed in Table |3l have been obtained in a 
similar way as for Haro 6-13. 

6.3. Confidence regions 

A rough estimate on the errors of the best-fit parameters can be 
obtained from the quantity 2 AL = 2(Ln,ax-i), which is asymp- 
totically chi square distributed with the number of degrees 
of freedom equal to the number of model parameters. Here, 
Lm-dx = max fL is the maximum achievable log likelihood, 
which is associated with the full parameter space of Wilks the- 
orem (Wilk, 1938). By thresholding 2 AL at the or-quantiles of 
a chi square distribution with 2 degrees of freedom, accounting 
for the parameters (kT, Nh), and 1 degree of freedom, account- 
ing for the parameter Nsk approximate confidence domains in 
parameter space are obtained. They are shown by solid con- 
tours in Fig. representing cuts through the likelihood sur- 
faces at confidence levels 68% (boldface), 90%, and 99%. The 
first errors given in Table |3l (outside brackets) represent pro- 
jections of the 68% confidence surfaces obtained in this way. 
Note that the absolute credibility of the best-fit solution (i.e. 
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1 ,+12.7 (+13.6) 
'-^"'- -11.7(-ll.6) 


5.8-102'' 


IRAS 04154+2823"' 


21.89 


24 


, ££+2.34 (+0.99) 
^•""-1.66(-1.01) 


J- ,q+2.81(+1.00) 
-^•^^-4. 19 (-3. 19) 


, 1 ,+5.2 (+4.5) 
-4.5(-5.8) 


7.5- 102** 


IRAS 04154+2823*' 


37.27 


145 


J- „q+2.91(+1.01) 
-^•"'^-2.62 (-2.32) 


J- ,o+2.98(+2.05) 
•'•^°-2.15(-1.06) 


oq ;-+12.2(+10.6) 
°^--'-11.5(-13.6) 


2.3-102'' 



Table 3. Maximum-L parameters deduced from XMM-Newton observations of the TMC. Superscripts a) and b) refer to different 
observations of the same object; Texp is the exposure time, and A^cnt is the observed number of counts (corresponding to Nsk + A/bg)- 
Errors outside brackets represent projections of the 68% likelihood surfaces (Fig.[n]inlets); errors in brackets are obtained from 
Monte-Carlo simulations (Sect. lOt . Luminosities Lx refer to Nh = and the energy band from 0.3 to 10 keV. 



the value of L^^x) is not addressed by the statistic 2 AL, which 
decouples the goodness-of-fit problem from the confidence do- 
main problem. 
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Fig. 12. Comparison of the unbinned estimates with results 
from binned one-temperature XSPEC fits. For kT and Nh, the 
error bars refer to unbinned 68% likelihood projections and 
XSPEC Axl = 1 levels (3 DOF). For Lx, only the best-fit val- 
ues are indicated. 



The parameter errors can also be predicted from Monte- 
Carlo simulations alone, similar as in an instrumental design 
study. To this end, the true and best-fit parameters (under L 
statistic) of more than 10^ samples are classified according to 
the best-fit parameters. For each class (containing about 100 
samples), the means {b) and standard deviations (cr) of the dif- 
ferences between true and best-fit parameters are evaluated, 
and taken as a proxy for the biases and errors of the best-fit pa- 
rameters. The simulation is repeated for each XMM-Newton ob- 
servation, so that the correct background and instrumental re- 
sponse are taken into account. The resulting error bounds b + cr 
are given in parentheses in Table |3 and are to be compared 
with the 68% confidence limits from the likelihood threshold- 
ing method. As can be seen, the Monte-Carlo errors are of the 
same order as those obtained from the likelihood thresholding, 
and the biases are generally in consistent direction. 

In summary, we have used here two complementary char- 
acterizations of parameter errors, based on the log likelihood 
ratio 2 AL and on Monte-Carlo simulations. The errors derived 
from the Monte-Carlo simulations do not invoke the asymp- 
totic assumption underlying Wilks theorem, and are therefore 
better adapted to low count rates. But then, they do not involve 
the actual observation at all, and rely entirely on the assump- 
tion that the true spectrum is correctly modeled by the tem- 
plate spectra and the background model. Any systematic error 
contribution is thus neglected. Therefore, the errors from the 
Monte-Carlo simulation tend to under-estimate the true errors 
and should be considered as lower bounds. As a further test, we 
have checked whether the minimum-;^f^ estimates are within the 
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Fig. 13. Exploration of the effect of energy band and binning on the HO Tau parameter estimates. Left column: energies from 0.3 
to 2 keV; right column: energies from 0.3 to 4 keV. Top row: unbinned likelihood. Middle row: Neyman's chi square with 5 bins. 
Bottom row: Neyman's chi square with 4 bins. The binned data are marked by dotted line. 



errors of the maximum-L solution, and found that this is so for 
all cases when 2 AL is used, and holds true in half of the cases 
when the Monte-Carlo error bounds are invoked (both methods 
referring to 68% confidence level). We shall not pursue here 
a deeper discussion of confidence regions, but terminate with 
a hint to the literature where more refined constructions may 
be found in Eadie et al. ( I1971> . Wachter et al. ( 1979 1, Cousins 
ST995L Porter ( fT996b . Feldman & Cousins til998.) . and Giunti 



6.4. Comparison with binned estimates 

In order to compare the unbinned estimates with the standard 
XSPEC results, the XSPEC iterative fitting package was ap- 
plied to background-subtracted observations with more than 50 
counts, using bins of (at least) 10 counts and default (x^) statis- 
tics. The spectral model is identical to the one used for un- 
binned estimation (single temperature, Nh, abundances). The 
results are summarized in Figure[21 displaying unbinned ver- 
sus binned results. Numerical values for the binned results are 
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tabulated Giidel et al. J2006al l. The crosses are centered at the 
best-fit solutions and cover formal 1-cr errors. For Lx, which 
is a derived quantity, only the best-fit parameters are indicated. 
As can be seen, the agreement between binned and unbinned 
estimators is generally good, except for HO Tau and GN Tau. 

In order to clarify the situation we have created plots of 
HO Tau similar to Fig. for both the L and xl^ statistics, 
and have systematically varied the energy band and binning. 
The binning procedure provides (approximately) equal num- 
ber of observed counts per bin, running from low to high en- 
ergies. Some exemplary results are shown in Figure The 
left column includes energies from 0.3 to 2 keV, and the right 
column includes energies from 0.3 to 4 keV. The top row 
refers to the unbinned likelihood, as quoted in Table |3l and 
Figure El The middle and bottom rows refer to Neyman's 
chi square statistic with 5 and 4 bins, respectively; con- 
taining (10,10,11,9,11), (11,11,11,11,11), (13,13,12,13), and 
(14,14,13,14) counts (fixed-n* method). The binned observed 
spectra are indicated by the dotted crosses, with errors repre- 
senting ± Vn counts. As can be seen, the unbinned likelihood 
has two shallow local maxima: a 'cold' one at kT ^ 0.2 keV 
and Nh ~ 8 x 10^' cm"^, and a 'warm' one at kT ^ 1 keV 
and Nh = 0. While the (cold) maximum-L solution and the L- 
profiles are stable under change of the energy range, the 68% 
;if^-profiles (middle) undergo a transition from a single to two 
regions, and the minimum-;^^^, solution flips from cold to warm 
as the number of bins is decreased (middle to bottom). This in- 
stability is mostly caused by the accumulation of counts around 
0.8 keV, which either fall into a single (middle right) or two 
(bottom) bins. The 9Q%xlf domain is approximately stable un- 
der re-binning, and contains the XSPEC solution (kT - 0.38 
keV,A^H =6.4x lO^'cm^^). 

Based on Figure [O] and the instability of 'warm' solu- 
tions under re-binning, one may conclude that the HO Tau data 
favour 'cold' solutions, all the more so as the 'warm' solutions 
lie on the border of the parameter space (Protassov et al. 2002 1. 
However, from an astrophysical point of view it is not obvious 
how such a cold and absorbed spectrum would arise. A pos- 
sible explanation is discussed in Section We shall therefore 
argue that no definitive conclusion is possible for HO Tau yet, 
and that further observations are needed. 

For the second discrepant case, GN Tau, the binned esti- 
mates yield smaller kT and larger Nh than the unbinned ones 
(the binned estimates lie in the unbinned 90% domain but out- 
side the unbinned 68% domain). This can be traced back to 12 
observed counts below 1 keV; if these are excluded then the 
binned and unbinned best-fit parameters agree within 20%. We 
propose the following interpretation. The presence of counts 
below 1 keV implies low absorption (Nh), provided that they 
cannot be explained by the background. The unbinned back- 
ground model has, at £ < 1 keV, fine-structures which do 
not coincide with the observed counts; hence the latter are at- 
tributed by the unbinned method to the source, entailing low 
Nh and (Fig. ^ inlet) large kT. The binned method, in con- 
trast, assigns more [namely, ^ fbg{E) dE] counts to the back- 
ground; hence, Nh is larger and kT is smaller. The discrepancy 



of GN Tau stems thus from background fine structures at ener- 
gies below 1 keV. 

7. Summary and conclusions 

We have used Monte-Carlo simulations to assess the per- 
formance of the unbinned (exact) Poisson likelihood (L), 
binned Poisson likelihood (C), Pearson's ™d Neyman's 
xlf, Kolmogorov-Smirnov (D), and Kuiper's (V) statistics in 
the model classification and point estimation problems of low- 
count XMM-Newton and Chandra/ ACIS spectra parameterized 
by temperature (kT), hydrogen density (Nh) and normalization 
(AO. By 'unbinned' we mean that no grouping of instrumental 
energy channels is involved, so that the full readout resolution 
is used. The L statistic equals the C statistic on the finest pos- 
sible (instrumental) binning. In all cases, the source normal- 
ization TVsi-c was taken from its maximum-likelihood estimator, 
while the shape parameters (kT, Nh) were taken such as to op- 
timize the different statistics. 

The outcome of the simulations can be summarized as fol- 
lows: 

- The unbinned Poisson likelihood performs best with regard 
to the following measures of performance: (i) the probabil- 
ity of a successful identification in a search over discrete 
t/y2 -ordered candidate models, (ii) the expected chi-square 
distance between continuously indexed true and estimated 
models, (iii) the area under the operator-receiver curves of 
reduced binary (two-model) classifier problems, and (iv) 
and generally also with respect to the mean square errors 
of individual parameter errors. 

- Under the L statistic, two models can on average be dis- 
tinguished if their chi square distance exceeds ~3. Similar 
statements hold for the other statistics {C,x^,xii' E), V) and 
correspondingly larger distances (up to ~5). The chi square 
distance is thus empirically found to be a useful parameter- 
free measure of discrepancy between Poisson intensities. 

- The x^ statistics should not be used unless more than 10 
counts per bin are available, and;^f^ should not be used for 
less than 15 counts. 

We therefore argue that the unbinned Poisson likelihood L 
is beneficial in cases with fewer counts than instrumental read- 
out channels. Using L, the naiTow spectral lines and steep gra- 
dients are taken into account at their exact location, unaffected 
by the bin size. Even more important, any instability of results 
under (arbitrary) choices of bin offsets and -sizes is avoided. 
In our view, this is a major benefit since it eliminates user- 
dependent sources of discrepant results. 

The L statistics has been applied to XEST data, and demon- 
strated to operate under real observational conditions. The 
major selection criteria were spectral simplicity (so that the 
3-parameter model applies), low count rate, and comparably 
well-known background. The parameter space was rigorously 
explored to avoid potential difficulties with iterative optimiza- 
tion. The maximum-L results usually agree with the minimum- 
X^ and minimum-;t^j, values within 68% confidence, but are not 
equal. There are, though, two cases (out of 14) where the un- 
binned and binned estimates disagree: HO Tau and GN Tau. 
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Such disagreement, by itself, is a helpful indicator for potential 
problems in the forward modeling. 

In particular, HO Tau is found by the L method to be rather 
cool (kT ~ 0.2 keV). Such a cool X-ray-emitting plasma would 
be unusual for a T Tauri star, the majority of which have av- 
erage temperatures of > 10 MK. In the present survey domi- 
nant plasma with similarly low temperature is only found in the 
low temperature component of the spectrum of the jet-driving 
sources DG Tau A, GV Tau A, and DP Tau, which may be 
due to shocks in the jet (Gudel et al. 2005 Gudel et al 2006b i. 
However, DG Tau A also has significant plasma at very high 
temperatures (~ 50 MK) and HO Tau has no known jet. A 
dominant low-temperature corona has been found on the near- 
est classical T Tauri star, TW Hya, which is thus far unique in 
this respect. Apparent high densities in this cool plasma suggest 
that collisionally-ionized shocks in columns of material accret- 
ing onto the star may generate the X-ray emission (Kastner et 
al. l2002[ Stelzer and Schmitt 2004). TW Hya is an unusually 
old classical T Tauri star, approximately 10 Myr old. Using the 
Siess et al. (2000) stellar evolution models, HO Tau appears 
also to be a relatively old classical T Tauri star, at 8-9 Myr 
Its mass accretion rate of 1 .3 x 10"^ M© yr"' (White and Ghez 
2001) is several times higher than that of TW Hya. Therefore 
it is plausible that HO Tau may be analogous to TW Hya, in 
which case the bulk of its X-ray emission may be generated by 
accretion shocks. 

On the other hand, we can check whether the derived 
Nh agrees with expectations from measured optical and near- 
infrared extinctions Ay and A j. For general interstellar matter, 
the conversions are Nh ~ 2 x 10^' Ai/ cm"^ and Nh ~ 5.6 x 
10^' Ay cm"^ (see Vuong et al. 2002 and references therein). 
The optical extinction for HO Tau is Ay = 1.11 - 1.13 mag 
(Kenyon & Hartmann .l995i White & Ghezl2001l, and the near- 
IR extinction is Ay = 0.32 - 0.46 mag (Kenyon & Hartmann 
[T9951 Briceno et al. l2002t . thus implying Nh = (1.8 - 2.6) x 
10^' cm"^ under the assumption of standard gas-to-dust ratios. 
These estimates are in better agreement with the "hot" solu- 
tion (Fig. 15) although the error ranges are large. We also note 
that significant deviations from the standard interstellar rela- 
tion between extinction and X-ray absorption are well known 
(e.g., in the jet-driving stars mentioned above). Such deviations 
may point to an "anomalous" gas-to-dust ratio for example as 
a consequence of dust evaporation. 

We also note that both the binned and unbinned methods 
provide low kT and high Nh- The discrepancy is in Lx (see 
Fig. I12> which is a consequence of large uncertainties in the 
spectral integration because most of the soft emission from 
the cool plasma is strongly absorbed. Indeed, if the tempera- 
ture estimate is changed from 0.14 keV (unbinned) to 0.28 keV 
(binned) then Lx drops by about a factor ten. The Lx of the un- 
binned method is indeed rather high given a stellar bolometric 
luminosity of this system of - 0.17 Lq - 6.5 x 10^^ erg s"'. 
Adopting the usual X-ray saturation level of Lx/Lt = 10"^ 
(e.g., Vilhu & Rucinski 1983 1, we would expect a maximum 
Lx = 6.5 X 10^^ erg s more in line with the binned solu- 
tion. The low count-rate of HO Tau unfortunately precludes 
a high-resolution X-ray spectroscopic study to study density- 
sensitive triplet emission-lines from He-like ions of O, Ne and 



Mg, which suggest high densities in the TW Hya plasma. As 
for GN Tau, Ay = 1.17 mag (Luhman 2004), hence we expect 
Nh - 6.6 X 10^' cm"^, very close to the result from the un- 
binned method. 

Although the Monte Carlo simulations suggest that the 
maximum-L estimates are closer to the true values when aver- 
aged over a (hypothetical) ensemble of similar observations, a 
few words of caution are in order The present simulations have 
assumed that the true spectrum is contained in the set of candi- 
date spectra. If this is not the case, or if the Poisson process as- 
sumption is violated, then the use of the most binding L statis- 
tics also introduces the most severe misinterpretations. A major 
cause of flawed models stems from the background estimation. 
We have assumed here that this estimation is perfect, and have 
not considered background errors (e.g., Conrad et al.'2003^, nor 
the more complicated problem of estimating the background 
spectrum jointly with the source spectrum. An imprecise back- 
ground model may be the cause for the discrepancy between 
binned and unbinned estimates found in GN Tau, where the 
most relevant low-energy background (0.3 to 1 keV) contains 
200 counts only, implying about 30% statistical background 
uncertainty at resolution AE ~0.04 keV. Furthermore, we have 
focused here on the choice of likelihood functions for the point 
estimation and model classification problems. Accordingly, our 
treatment of parameterization issues and confidence regions 
was rather crude. In particular, we did not attempt to introduce 
any (Bayesian) a priori information other than implicit in the 
choice of the forward models. 

Finally, it should be pointed out that the unbinned ap- 
proach is in principle not restricted to the one-dimensional 
spectrum problem considered in this article, and that the en- 
ergy tags might be replaced by tuples of photon energy, arrival 
time, and position on the detector However, the correspond- 
ing forward models would involve many more degrees of free- 
dom, to the point where a simple maximum-likelihood princi- 
ple might no longer be sufficient to uniquely determine the so- 
lution. Additional information (such as a Bayesian prior) would 
then be required in order to regularize the problem. Also, the 
more elaborated forward models would be more vulnerable to 
systematic errors (i.e., from CCD boundaries). In view of the 
(sparse) data we shall not pursue these issues. 

Appendix A: Generation of non-homogeneous 
Poisson variates by the inversion method 

This Appendix describes the inversion method (Lewis & 
Shedler 119791 Devroye p986 1, one of the two methods used 
here to simulate event lists. The inversion method, in the 
present implementation, relies on Theorem 1 .4 (Chapter 6) of 
Devroye ( 1986) stating that if < Xi < X2 < ... is a homo- 
geneous Poisson process with unit rate function and A(x) is a 
non-decreasing function with A(0) - then < A^\Xi) < 
^ '(^2) < •■■ is a non-homogeneous Poisson process with cu- 
mulative rate function A(ji:). This follows from the fact that if 
F is a continuous (cumulative) distribution with inverse F"' 
and U is uniformly distributed in the unit interval [0,1], then 
F^\U) has (cumulative) distribution F. Our numerical imple- 
mentation makes use of the monotony of A(x) and the fact that 
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Xi+i - X, is exponentially distributed, which allows a succes- 
sive computation of A"'(X,). It proceeds as follows. Let a rate 
function A(x) be specified by a sufficiently resolved discrete 
version Aj with j ranging from to A^^ - 1, and define the dis- 
crete cumulative rate function by Aq - tAq, Aj - Ay_i + tAi, 
with T = N^^(Nc - 1) and N - E^o' '^j- Then, an non- 
decreasing bin number sequence rit e [0, 1 , . . . , A^c - 1 ] of a non- 
homogeneous Poisson process with intensity Aj is obtained by 
the following pseudocode, 

1. Initialisation: 
7 = 

k^Q 

X = -TlnUo 

2. Iteration: 

while X < (Nc - l)do 
while (Aj < X) and < A^^) do 

;■ = ;■ + 1 

end do 

nk = 

X = X-T\nUk 
k^k+l 
end 

where Uk are uniform random numbers in [0, 1], so that 
-T In Uk is exponentially distributed. It was numerically ver- 
ified that the obtained in the above way are Poisson dis- 
tributed with intensity /l,,^ . 
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